Output format: `training-output` to build supervised models #801

enoriega · 2023-09-06T21:07:18Z

Summary

Added a new output format suitable to train classifiers using a python pipeline. It "flattens" activations and regulations and creates a json array with the tokens, spans, label and polarity for each event.

Example

[
  {
    "sentence_tokens" : [ "Notably", ",", "overexpressing", "MafB", "in", "human", "beta-cell", "lines", "(", "beta", "TC3", "cells", ")", "resulted", "in", "increased", "cell", "proliferation", "by", "upregulating", "important", "cell", "cycle", "regulators", ",", "like", "cyclin", "D2", "and", "cyclin", "B", "(", "28", ")", "." ],
    "event_indices" : [ 16, 17, 18, 19, 20, 21, 22, 23 ],
    "type" : "Positive_activation",
    "polarity" : true,
    "controller_indices" : [ 16, 17, 18 ],
    "controlled_indices" : [ 21, 22, 23 ],
    "trigger_indices" : [ 19, 20 ]
  }, {
    "sentence_tokens" : [ "In", "vivo", "glucose", "stimulated", "insulin", "secretion", "(", "GSIS", ")", "experiment", ".." ],
    "event_indices" : [ 2, 3, 4, 5, 6 ],
    "type" : "Positive_activation",
    "polarity" : true,
    "controller_indices" : [ 2, 3 ],
    "controlled_indices" : [ 4, 5, 6 ],
    "trigger_indices" : [ 3, 4 ]
  }
]

# Conflicts: # bioresources/src/main/resources/org/clulab/reach/kb/NER-Grounding-Override.tsv.gz # bioresources/src/main/resources/org/clulab/reach/kb/ner/BioProcess.tsv.gz # bioresources/src/main/resources/org/clulab/reach/kb/ner/CellLine.tsv.gz # bioresources/src/main/resources/org/clulab/reach/kb/ner/CellType.tsv.gz # bioresources/src/main/resources/org/clulab/reach/kb/ner/Cellular_component.tsv.gz # bioresources/src/main/resources/org/clulab/reach/kb/ner/Disease.tsv.gz # bioresources/src/main/resources/org/clulab/reach/kb/ner/Family.tsv.gz # bioresources/src/main/resources/org/clulab/reach/kb/ner/Gene_or_gene_product.tsv.gz # bioresources/src/main/resources/org/clulab/reach/kb/ner/Organ.tsv.gz # bioresources/src/main/resources/org/clulab/reach/kb/ner/Simple_chemical.tsv.gz # bioresources/src/main/resources/org/clulab/reach/kb/ner/Site.tsv.gz # bioresources/src/main/resources/org/clulab/reach/kb/ner/Species.tsv.gz # bioresources/src/main/resources/org/clulab/reach/kb/ner/TissueType.tsv.gz

…ve associations

…statistics events. Added support for significance and confidence intervals to the arizona output (and possibly to the CMU by extension)

# Conflicts: # src/main/scala/org/clulab/reach/ReachCLI.scala

…tion extraction

kwalcock

The crossScalaVersion does need to be changed for publication.

build.sbt

export/src/main/scala/org/clulab/reach/export/TrainingDataExporter.scala

kwalcock · 2023-09-13T16:06:56Z

@enoriega, this is being built for both Scala 2.11 and 2.12. The earlier version does not like trailing/dangling commas like the ones in TrainingDataExporter, so it doesn't compile. One can use ++compile or ++2.11.12 and then compile to test.

kwalcock · 2023-09-22T23:02:59Z

That TrainingDataExporter still needs a comma removed at line 76 in order to work on Scala 2.11.

…le learning

…se its published version

# Conflicts: # main/src/main/resources/application.conf

enoriega added 30 commits February 14, 2021 20:03

Added new bio processes related to frailty

7b703f5

Merge branch 'master' into frailty

ccc11e5

Progress on the grammar for associations. The use case is frailty

80a9bc4

Progress on the grammar for associations. The use case is frailty

a64f4e3

Progress on the grammar for associations. The use case is frailty

085ecf2

Merge remote-tracking branch 'origin/frailty' into frailty

19f2c88

Added more unit tests for association events

d366998

Restored the frailty related entities to the override file

ca5bbed

Greatly expanded the association unit tests and grammar

cb948d8

Expanded the grammar with more rules and triggers

8e7c9dd

Added a term to the hedging lexicon

6388831

Merge remote-tracking branch 'origin/master' into frailty

42ab64c

Merge branch 'decouple_kb' into frailty

de1b87d

Added support for p values and correlation coefficients

e177df7

Changed the Association unit tests to account for Positive and Negati…

8e5eba6

…ve associations

Added support for confidence intervals

826df27

Process materials and methods too

b55676c

Bug fix for the issue that crashed 1/3 of the papers during assembly

28b3180

Improved polarity detection for associations

ff301a8

Extended the assembly manager to allow bypassing of assembly for the …

25de135

…statistics events. Added support for significance and confidence intervals to the arizona output (and possibly to the CMU by extension)

Changed AZ output to add mark up to visualize the elements of the events

8744a2d

Merge branch 'temp' into frailty

a4c90a8

Fixed runReachCLI.sh

7426870

Merge branch 'master' into frailty

8c767e9

Merge branch 'master' into frailty

74dfd62

Added a new use case to NXML searcher for our project

ce6e1b9

Added docstring

3c2323b

Merge branch 'master' into frailty

3ee7dc7

Merge branch 'serialization' into frailty

f81b3ed

enoriega and others added 10 commits May 30, 2022 19:03

Updates to allow cell types and organs as participants

27a0dab

Added chilton use case and a

5c3a22f

bugfix

6e87e4b

Merge branch 'master' into frailty

064fdd2

Fixed de-serialization bugs

ed01f65

Merge remote-tracking branch 'origin/master' into frailty

d59f218

# Conflicts: # src/main/scala/org/clulab/reach/ReachCLI.scala

Added terms for Skye's search

0f07901

Bumped up the sbt version and scala version to support apple silicon

5cb32c5

Added the "training-data" output format to train classifiers and rela…

1eb6aad

…tion extraction

Fixed a bug that crashed the whole CLI app by throwing an error

4bc7480

enoriega requested a review from kwalcock September 6, 2023 21:07

kwalcock requested changes Sep 12, 2023

View reviewed changes

Updates from Keith's review

808d130

enoriega requested a review from kwalcock September 22, 2023 22:31

Removed a comma to make it work on Scala 2.11

ade626a

kwalcock approved these changes Sep 22, 2023

View reviewed changes

enoriega added 12 commits October 5, 2023 18:29

Added additional utilities for exporting training data. This time: Ru…

11e40bc

…le learning

Added visual analytics output

4893638

Added negative examples to the training data exporter

01e5411

Added the missing KB ids to the visual analytics output

fe8cf32

Improved the training data output format

89936c3

Adding article text for the VA project

f01d71f

Added character span for mention elements in VA output

ba4f3b0

Changed the nxml reader version temporarily to a local snapshot

ed0b6a7

Cherry picked the mark up assembly form the frailty branch

1162a7f

Restored AssemblyRow to its previous state and updated build sbt to u…

b54a9d8

…se its published version

Merge branch 'refs/heads/frailty' into enoriega/training_output

9386700

# Conflicts: # main/src/main/resources/application.conf

Added the "is_negated" flag to the outputs

6f07269

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Output format: `training-output` to build supervised models #801

Output format: `training-output` to build supervised models #801

Uh oh!

enoriega commented Sep 6, 2023

Uh oh!

kwalcock left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kwalcock commented Sep 13, 2023

Uh oh!

kwalcock commented Sep 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Output format: training-output to build supervised models #801

Are you sure you want to change the base?

Output format: training-output to build supervised models #801

Uh oh!

Conversation

enoriega commented Sep 6, 2023

Summary

Example

Uh oh!

kwalcock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kwalcock commented Sep 13, 2023

Uh oh!

kwalcock commented Sep 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Output format: `training-output` to build supervised models #801

Output format: `training-output` to build supervised models #801